Phase III - Model engineering

Course: MLOps engineering
Author: Firas Jolha

Agenda

Phase III - Model engineering
Agenda
Description
MLflow
- Components
Use Cases of MLflow
- Concepts
Get MLflow
MLflow CLI
- Tracking server
- Experiments
- Runs
- Artifacts
- Others
Tracking ML experiments
- Tracking models demo
MLflow APIs
MLflow models
MLflow projects
Project tasks
- A. Repository
- B. Report [Only for Master’s students]
References

Description

The modeling phase is the ML-specific part of the process. This phase aims to specify one or several machine learning models to be deployed in the production. The translation to the ML task depends on the business problem that we are trying to solve. Constraints and requirements from the Business and Data Understanding phase will shape this phase. For example, the application domain’s model assessment metrics might include performance metrics, robustness, fairness, scalability, interpretability, model complexity degree, and model resource demand. We should adjust the importance of each of these metrics according to the use case.

Generally, the modeling phase includes model selection, model specialization, and model training tasks. Additionally, depending on the application, we might use a pre-trained model, compress the model, or apply ensemble learning methods to get the final ML model.

One main complaint about machine learning projects is the lack of reproducibility. Therefore we should ensure that the method and the results of the modeling phase are reproducible by collecting the model training method’s metadata. Typically we collect the following metadata: algorithm, training, validation and testing data set, hyper-parameters, and runtime environment description. The result reproducibility assumes the validation of the model’s mean performance on different random seeds. Following best practices, documenting trained models increases the transparency and explainability in ML projects. A helpful framework here is the “Model Cards Toolkit”.

Many phases in ML development are iterative. Sometimes, we might need to review the business goals, KPIs, and available data from the previous steps to adjust the outcomes of the ML model results. Finally, we package the ML workflow in a pipeline to create repeatable model training during the modeling phase.

MLflow

MLflow, at its core, provides a suite of tools aimed at simplifying the ML workflow. It is tailored to assist ML practitioners throughout the various stages of ML development and deployment.

Components

MLflow’s functionalities are rooted in several foundational components:

Tracking: MLflow Tracking provides both an API and UI dedicated to the logging of parameters, code versions, metrics, and artifacts during the ML process. This centralized repository captures details such as parameters, metrics, artifacts, data, and environment configurations, giving teams insight into their models’ evolution over time. Whether working in standalone scripts, notebooks, or other environments, Tracking facilitates the logging of results either to local files or a server, making it easier to compare multiple runs across different users.
Model Registry: A systematic approach to model management, the Model Registry assists in handling different versions of models, discerning their current state, and ensuring smooth productionization. It offers a centralized model store, APIs, and UI to collaboratively manage an MLflow Model’s full lifecycle, including model lineage, versioning, aliasing, tagging, and annotations.
Projects: MLflow Projects standardize the packaging of ML code, workflows, and artifacts, akin to an executable. Each project, be it a directory with code or a Git repository, employs a descriptor or convention to define its dependencies and execution method.

Use Cases of MLflow

Here are some typical use cases:

Experiment Tracking: A data science team leverages MLflow Tracking to log parameters and metrics for experiments within a particular domain. Using the MLflow UI, they can compare results and fine-tune their solution approach. The outcomes of these experiments are preserved as MLflow models.
Model Selection and Deployment: MLOps engineers employ the MLflow UI to assess and pick the top-performing models. The chosen model is registered in the MLflow Registry, allowing for monitoring its real-world performance.
Model Performance Monitoring: Post deployment, MLOps engineers utilize the MLflow Registry to gauge the model’s efficacy, juxtaposing it against other models in a live environment.
Collaborative Projects: Data scientists embarking on new ventures organize their work as an MLflow Project. This structure facilitates easy sharing and parameter modifications, promoting collaboration.

Concepts

Runs
are executions of some piece of data science code, for example, a single python train.py execution. Each run records metadata (various information about your run such as metrics, parameters, start and end times) and artifacts (output files from the run such as model weights, images, etc).

Experiments
An experiment groups together runs for a specific task. They are logical containers for your runs. You can create an experiment using the CLI, API, or UI.

Model
An MLflow Model is created from an experiment or run that is logged with one of the model flavor’s mlflow.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry.

Registered Model
An MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, aliases, tags, and other metadata.

Model Version
Each registered model can have one or many versions. When a new model is added to the Model Registry, it is added as version 1. Each new model registered to the same model name increments the version number.

Model Alias
Model aliases allow you to assign a mutable, named reference to a particular version of a registered model. By assigning an alias to a specific model version, you can use the alias to refer that model version via a model URI or the model registry API.

Get MLflow

You can easily install MLflow as a Python package:

pip install mlflow

This will install the mlflow package and mlflow command.

MLflow CLI

The MLflow command-line interface (CLI) provides a simple interface to various functionality in MLflow. Here I show some of them:

Tracking server

Run MLflow server on localhost (default) and port 5000 (default)

mlflow server -h localhost -p 5000

Experiments

Manage experiments associated with a tracking server.

mlflow experiments

Create a new experiment “cli-exp”

mlflow experiments create --experiment-name "cli-exp"

Mark an experiment whose id is “868669161979377932” for deletion.

mlflow experiments delete --experiment-id 868669161979377932

Restore the deleted experiment whose id is “868669161979377932”

mlflow experiments restore --experiment-id 868669161979377932

Search for experiments

# Only active experiments (default)
mlflow experiments search -v active_only

# Only deleted experiments
mlflow experiments search -v deleted_only

# All experiments
mlflow experiments search -v all

Runs

Manage runs of experiments associated with a tracking server.

mlflow runs

List all runs of the specified experiment whose id is “868669161979377932” in the configured tracking server.

mlflow runs list --experiment-id "868669161979377932"

Print all of run details to the stdout as JSON format.

mlflow runs describe --run-id "583005f6b9a847c4ac7448990d54df09"

Mark a run whose id “583005f6b9a847c4ac7448990d54df09” is for deletion.

mlflow runs delete --run-id "583005f6b9a847c4ac7448990d54df09"

Restore a deleted run whose id “583005f6b9a847c4ac7448990d54df09” is from trash.

mlflow runs restore --run-id "583005f6b9a847c4ac7448990d54df09"

Artifacts

List all artifacts of the run id “583005f6b9a847c4ac7448990d54df09”

mlflow artifacts list --run-id "583005f6b9a847c4ac7448990d54df09"

Download the artifact associated with the run whose is is “583005f6b9a847c4ac7448990d54df09” and store the artifact in a local folder /tmp/my-artifact


# using run-id
mlflow artifacts download --run-id '583005f6b9a847c4ac7448990d54df09' -d '/tmp/my-artifact'

# using artifact-uri
mlflow artifacts download --artifact-uri 'runs:/583005f6b9a847c4ac7448990d54df09/iris_model' -d '/tmp/my-artifact'

Log a local file as an artifact of a run

mlflow artifacts log-artifact -l requirements.txt --run-id '583005f6b9a847c4ac7448990d54df09'

Others

Prints out useful information for debugging issues with MLflow.

mlflow doctor

Delete all runs in the trash.

mlflow gc

When you delete an experiment/run, its models are not deleted. If you want to delete its models too, you need to delete them manually.

Tracking ML experiments

ML experimentation is one of the core practices in Model engineering phase. ML experiments are usually done to optimize ML models, try new methods, test hypothesis…etc. We need a system which can help us to track these experiments and manage our ML workflows such that we can return back and check our progress. MLflow is one of the common tools used to track and log ML models and experiments.

Assume we have the following code snippet for our ML modeling short_ml.py:

# Iris data sets consists of 3 different types of irises’ 
# (Setosa, Versicolour, and Virginica) petal and sepal 
# length, stored in a 150x4 numpy.ndarray

# The rows being the samples and the columns being: 
# Sepal Length, Sepal Width, Petal Length and Petal Width.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)


# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size=0.2, random_state=42
)

# Define the model hyperparameters
params = {
  "solver": "lbfgs",
  "penalty": "l2",
  "random_state": 8888,
}

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred = lr.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="macro")
recall = recall_score(y_test, y_pred, average="macro")
f1 = f1_score(y_test, y_pred, average="macro")

print(accuracy, precision, recall, f1)

Tracking models demo

We can track experiments in MLflow for the previous code snippet as follows:

1. Run Tracking server (locally)

# mlflow server --host localhost --port 5000
mlflow server

# mlflow ui

This will open the MLflow tracking web server on the port 5000 of the local host by default.

2. Set the Tracking Server URI

We do not need it unless you are using different values than default port and hostname.

import mlflow

# mlflow.set_tracking_uri(uri="http://localhost:5000")

# MLFLOW_TRACKING_URI  environment variable

3. Build a model and prepare metadata for logging

The following is the code that you want to use it for training and evaluating your model. In this code, we are not tracking ML experiments.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)


# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size=0.2, random_state=42
)

# Define the model hyperparameters
params = {
  "solver": "lbfgs",
  "penalty": "l2",
  "random_state": 8888,
}

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred = lr.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="macro")
recall = recall_score(y_test, y_pred, average="macro")
f1 = f1_score(y_test, y_pred, average="macro")

print(accuracy, precision, recall, f1)

4. Log the model and its metadata to MLflow

The steps are as follows:

Set the tracking server Uri
Create/Retrieve an experiment
Start a run
Log metadata (params, metrics) and artifacts (model)
Run the code
Track the run and results in Tracking UI.

We log the model and metadata after we finish training the model. So, do not include logging code inside training code.

import mlflow
from mlflow.models import infer_signature
import mlflow.sklearn
import mlflow.exceptions

# Set our tracking server uri for logging
# mlflow.set_tracking_uri(uri = "http://localhost:5000")

experiment_name = "MLflow-experiment-01"

try:
    # Create a new MLflow Experiment
    experiment_id = mlflow.create_experiment(name=experiment_name)
except mlflow.exceptions.MlflowException as e:
    experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id

print(experiment_id)


# Start an MLflow run
with mlflow.start_run(run_name="run-01", experiment_id=experiment_id) as run:

    # Log the hyperparameters
    mlflow.log_params(params=params)

    # Log the performance metrics
    mlflow.log_metric("accuracy", accuracy) # type: ignore
    mlflow.log_metric("f1", f1) # type: ignore
    mlflow.log_metrics({
        "accuracy": accuracy,
        "f1": f1
    })

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for my data")

    # Infer the model signature
    signature = infer_signature(X_test, y_test)


    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="LR_model",
        signature=signature,
        input_example=X_test,
        registered_model_name="first_model"
    )

In MLflow, a model signature precisely defines the schema for model inputs, outputs, and any additional parameters required for effective model operation.

Notes:

You can log the source code of the experiments in MLflow as an artifact using mlflow.log_artifact(local_path, artifact_path, run_id) function. Generally, you can use this function to log any standalone files you prefer like images, text files…etc.
After starting/running a new experiment, push commit to github with tags, where the tag name and commit message contains the experiment name such that checking out such experiments later will be easy.
Always work on a new branch for training ML models, and merge to the main branch whenever you are ready to deploy your model, since pushing a commit to main branch will trigger the CI/CD workflows which will be built in Phase 6 of this project.

5. Load the model as a Python Function (`pyfunc`) and use it for inference

The python_function (pyfunc) model flavor serves as a default model interface for MLflow Python models. Any MLflow Python model is expected to be loadable as a python_function model. This enables other MLflow tools to work with any python model regardless of which persistence module or framework was used to produce the model.



import pandas as pd

# Load the model back for predictions as a generic Python Function model flavor
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

# Run predictions
predictions = loaded_model.predict(X_test)

iris_feature_names = datasets.load_iris().feature_names

# Compare some prediction results
result = pd.DataFrame(X_test, columns=iris_feature_names)
result["actual_class"] = y_test
result["predicted_class"] = predictions

result[:4]

You can run the file as a normal python file using python command.

After you run the file, you will notice two experiments, one is the one we created in the code and another one is Default experiment.

MLflow locally uses mlruns folder as a backend store and model registry. It will store all artifacts there.

If you are getting an error like this.

RESOURCE_DOES_NOT_EXIST: Could not find experiment with ID 0

That means you deleted the Default experiment whose id is 0 and used to hold any runs executed without a specific experiment. In order to fix this issue, you have to create a new experiment with Default and set its id as 0 manually.

Create a new experiment.

(.venv) firasj@Lenovo:~/project$ mlflow experiments create -n Default

Created experiment 'Default' with id 529278589677091129

Rename the folder mlruns/529278589677091129 to mlruns/0 and change its metadata as follos:

# The path is different in your machine but it should end with `mlruns/0`
artifact_location: file:///home/firasj/project/mlruns/0
creation_time: 1720372282876 # Do not change this
experiment_id: '0' # Change this
last_update_time: 1720372282876 # Do not change this
lifecycle_stage: active # Do not change this
# Lifecycle stage of the experiment. Can either be ‘active’ or ‘deleted’.
name: Default # Do not change this

6. View the Run in the MLflow UI

In order to see the results of our run, we can navigate to the MLflow UI. Since we have already started the Tracking Server at http://localhost:5000, we can simply navigate to that URL in our browser. When opening the site, you will see a screen similar to the following:

Clicking on the name of the Experiment that we created (“MLflow experiment 01”) will give us a list of runs associated with the Experiment. You should see a random name that has been generated for the run and nothing else show up in the Table list view to the right.

Clicking on the name of the run will take you to the Run page, where the details of what we’ve logged will be shown. The elements have been highlighted below to show how and where this data is recorded within the UI.

You can also access all of the functions in the Tracking UI programmatically with MlflowClient. For example, the following code snippet search for runs that has the best validation loss among all runs in the experiment.

client = mlflow.tracking.MlflowClient()
experiment_id = "0"
best_run = client.search_runs(
    experiment_id, order_by=["metrics.val_loss ASC"], max_results=1
)[0]
print(best_run.info)
# {'run_id': '...', 'metrics': {'val_loss': 0.123}, ...}

MLflow APIs

1. Fluent tracking `mlflow` API

The mlflow module provides a high-level “fluent” API for starting and managing MLflow runs. For example:

import mlflow

# Start the run
mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()

You can also use the context manager syntax like this:

with mlflow.start_run() as run:
    mlflow.log_param("my", "param")
    mlflow.log_metric("score", 100)

which automatically terminates the run at the end of the with block.

The fluent tracking API is not currently threadsafe. Any concurrent callers to the tracking API must implement mutual exclusion manually.

2. Lower-level `mlflow.client` API

The mlflow.client module provides a Python CRUD interface to MLflow Experiments, Runs, Model Versions, and Registered Models. This is a lower level API that directly translates to MLflow REST API calls. Some of the key functions of this API are demonstrated as follows:

from mlflow import MlflowClient
import mlflow

# Client of an MLflow Tracking Server that creates and manages experiments and runs, and of an MLflow Registry Server that creates and manages registered models and model versions.
client = MlflowClient()

experiment_name = "my experiment"
# Create an experiment
try:
    # Create a new MLflow Experiment
    experiment_id = mlflow.create_experiment(name=experiment_name)
except mlflow.exceptions.MlflowException as e:
    experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id


# Create a mlflow.entities.Run object that can be associated with metrics, parameters, artifacts, etc. Unlike mlflow.projects.run(), creates objects but does not run code. Unlike mlflow.start_run(), does not change the “active run” used by mlflow.log_param().
run = client.create_run(experiment_id =experiment_id, run_name = "basic run")

run_id = run.info.run_id

model_name = "LR model"

model_path = model_name


# Create a new registered model in backend store.
rm = client.create_registered_model(name = model_name, description = "First LR model")

print(f"name: {rm.name}")
print(f"tags: {rm.tags}")
print(f"description: {rm.description}")




model_uri = f"runs:/{run_id}/{model_path}"

mv = client.create_model_version(name = model_name, source=model_uri, run_id = run_id)


print(f"Name: {mv.name}")
print(f"Version: {mv.version}")
print(f"Description: {mv.description}")
print(f"Status: {mv.status}")
print(f"Stage: {mv.current_stage}")


# Delete model version in backend.
client.delete_model_version(name = mv.name, version = mv.version)

# Delete registered model. Backend raises exception if a registered model with given name does not exist.
client.delete_registered_model(name = rm.name)


# Deletes a run with the given ID.
client.delete_run(run_id = run_id)

# Delete an experiment from the backend store. This deletion is a soft-delete, not a permanent deletion.
client.delete_experiment(experiment_id = experiment_id)

Launching Multiple Runs in One Program

Sometimes you want to launch multiple MLflow runs in the same program: for example, maybe you are performing a hyperparameter search locally or your experiments are just very fast to run. The way to do this depends on whether you want to run them sequentially or in parallel.

Sequential Runs

# First run
with mlflow.start_run():
    mlflow.log_param("x", 1)
    mlflow.log_metric("y", 2)
    ...

# Another run
with mlflow.start_run():
    ...

Parallel Runs [Extra section]

MLflow also supports running multiple runs in parallel using multiprocessing or multi threading.

Multiprocessing

import mlflow
import multiprocessing as mp


def train_model(params):
    with mlflow.start_run():
        mlflow.log_param("p", params)
        ...


if __name__ == "__main__":
    params = [0.01, 0.02, ...]
    pool = mp.Pool(processes=4)
    pool.map(train_model, params)

Multithreading

import mlflow
import threading


def train_model(params):
    # Create a child run by passing nested=True
    with mlflow.start_run(nested=True):
        mlflow.log_param("p", params)
        ...


if __name__ == "__main__":
    params = [0.01, 0.02, ...]
    threads = []
    for p in params:
        t = threading.Thread(target=train_model, args=(p,))
        threads.append(t)
        t.start()

    for t in threads:
        t.join()

Nested Runs

You can also create multiple runs inside a single run. This is useful for scenario like hyperparameter tuning, cross-validation folds, where you need another level of organization within an experiment. You can create child runs by passing parent_run_id to mlflow.start_run() function.

# Start parent run
with mlflow.start_run() as parent_run:
    param = [0.01, 0.02, 0.03]

    # Create a child run for each parameter setting
    for p in param:
        with mlflow.start_run(nested=True) as child_run:
            mlflow.log_param("p", p)
            ...
            mlflow.log_metric("val_loss", val_loss)

You can fetch all child runs under a parent run using tags. The MlflowClient.set_tag() function lets you add custom tags to runs. A tag can only have a single unique value mapped to it at a time. For example:


client.set_tag(run.info.run_id, "tag_key", "tag_value")

Autlogging [Extra section]

Autologging automatically logs your model, metrics, examples, signature, and parameters with only a single line of code for many of the most popular ML libraries in the Python ecosystem.

import mlflow

# Enable autlogging
mlflow.autolog()

# Your ML modeling code is here.

You should call mlflow.autolog() before your training code.

Get MLflow Run instance from autologged results

In some cases, you may want to access the MLflow Run instance associated with the autologged results. You can access the most recent autolog run through the mlflow.last_active_run() function.

import mlflow

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)
autolog_run = mlflow.last_active_run()
print(autolog_run)

Tracking datasets

There are two main abstract components associated with the mlflow.data module, Dataset and DatasetSource. The Dataset abstraction is a metadata tracking object that holds the information about a given logged dataset. It can be mlflow.data.pandas_dataset.PandasDataset, mlflow.data.tensorflow_dataset.TensorFlowDataset. The DatasetSource component of a Dataset represents the source of a dataset, such as a directory in S3, a Delta Table, or a URL.

Create a Pandas dataset

The following example demonstrates how to construct a mlflow.data.pandas_dataset.PandasDataset object from a Pandas DataFrame:

import mlflow.data
import pandas as pd
from mlflow.data.pandas_dataset import PandasDataset


dataset_source_url = "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-white.csv"
raw_data = pd.read_csv(dataset_source_url, delimiter=";")

# Create an instance of a PandasDataset
dataset = mlflow.data.from_pandas(
    raw_data, source=dataset_source_url, name="wine quality - white", targets="quality"
)

Log, and load MLflow dataset

For this example, you need to install xgboost package as follows:

pip install xgboost

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import xgboost

import mlflow
from mlflow.data.pandas_dataset import PandasDataset


dataset_source_url = "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-white.csv"
raw_data = pd.read_csv(dataset_source_url, delimiter=";")

# Extract the features and target data separately
y = raw_data["quality"]
X = raw_data.drop("quality", axis=1)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=17
)

# Create a label encoder object
le = LabelEncoder()

# Fit and transform the target variable
y_train_encoded = le.fit_transform(y_train)
y_test_encoded = le.transform(y_test)

# Fit an XGBoost binary classifier on the training data split
model = xgboost.XGBClassifier().fit(X_train, y_train_encoded)

# Build the Evaluation Dataset from the test set
y_test_pred = model.predict(X=X_test)

eval_data = X_test
eval_data["label"] = y_test

# Assign the decoded predictions to the Evaluation Dataset
eval_data["predictions"] = le.inverse_transform(y_test_pred)

# Create the PandasDataset for use in mlflow evaluate
pd_dataset = mlflow.data.from_pandas(
    eval_data, predictions="predictions", targets="label"
)

mlflow.set_experiment("White Wine Quality")

# Log the Dataset, model, and execute an evaluation run using the configured Dataset
with mlflow.start_run() as run:
    mlflow.log_input(pd_dataset, context="training")

    mlflow.xgboost.log_model(
        artifact_path="white-wine-xgb", xgb_model=model, input_example=X_test
    )

    result = mlflow.evaluate(data=pd_dataset, predictions=None, model_type="classifier")


# Retrieve the run information
logged_run = mlflow.get_run(run.info.run_id)

# Retrieve the Dataset object
logged_dataset = logged_run.inputs.dataset_inputs[0].dataset

# View some of the recorded Dataset information
print(f"Dataset name: {logged_dataset.name}")
print(f"Dataset digest: {logged_dataset.digest}")
print(f"Dataset profile: {logged_dataset.profile}")
print(f"Dataset schema: {logged_dataset.schema}")

Explore Tracking server UI

The Tracking UI lets you visually explore your experiments and runs. It allows:

Experiment-based run listing and comparison (including run comparison across multiple experiments)
Searching for runs by parameter or metric value
Visualizing run metrics
Downloading run results (artifacts and metadata)

Experiments view

Models view

Artifact store persists (typicaly large) artifacts for each run, such as model weights (e.g. a pickled scikit-learn model), images (e.g. PNGs), model and data files (e.g. Parquet file). MLflow stores artifacts in a local directory (mlruns) by default, but also supports different storage options such as Amazon S3 and Azure Blob Storage.

MLflow models

An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.

How MLflow models are stored?

Each MLflow Model is a directory containing arbitrary files, together with an MLmodel file in the root of the directory that can define multiple flavors that the model can be viewed in.

Logging ML models

sklearn models

import mlflow
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from mlflow.models import infer_signature
import mlflow.sklearn
import mlflow.exceptions

# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define the model hyperparameters
params = {
    "solver": "lbfgs",
    "max_iter": 1000, # Use hydra for configuration management    
    "random_state": 8888,
}

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred = lr.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="macro")
recall = recall_score(y_test, y_pred, average="macro")
f1 = f1_score(y_test, y_pred, average="macro")
print(accuracy, precision, recall, f1)


experiment_name = "MLflow experiment 01"
run_name = "run 01"
try:
    # Create a new MLflow Experiment
    experiment_id = mlflow.create_experiment(name=experiment_name)
except mlflow.exceptions.MlflowException as e:
    experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id

print(experiment_id)

with mlflow.start_run(run_name=run_name, experiment_id=experiment_id) as run:

    # Log the hyperparameters
    mlflow.log_params(params=params)

    # Log the performance metrics
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("f1", f1)
    mlflow.log_metrics({
        "accuracy": accuracy,
        "f1": f1
    })

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for iris data")

        # Infer the model signature
    signature = infer_signature(X_test, y_test)


    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_test,
        registered_model_name="LR_model_01",
        pyfunc_predict_fn = "predict_proba"
    )

    sk_pyfunc = mlflow.sklearn.load_model(model_uri=model_info.model_uri)

    predictions = sk_pyfunc.predict(X_test)
    print(predictions)

    

    eval_data = pd.DataFrame(y_test)
    eval_data.columns = ["label"]
    eval_data["predictions"] = predictions
    
    results = mlflow.evaluate(
        data=eval_data,
        model_type="classifier",
        targets= "label",
        predictions="predictions",
        evaluators = ["default"]
    )

    print(f"metrics:\n{results.metrics}")
    print(f"artifacts:\n{results.artifacts}")

Pytorch models

import numpy as np
import mlflow
from mlflow.models import infer_signature
import torch
from torch import nn
import pandas as  pd

net = nn.Linear(10, 1)
loss_function = nn.L1Loss()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-4)

X = torch.randn(100, 10)
y = torch.randn(100, 1)

print(X.shape, y.shape)

epochs = 5
for epoch in range(epochs):
    optimizer.zero_grad()
    outputs = net(X)

    loss = loss_function(outputs, y)
    loss.backward()

    optimizer.step()

with mlflow.start_run() as run:
    signature = infer_signature(X.numpy(), net(X).detach().numpy())
    model_info = mlflow.pytorch.log_model(
        pytorch_model = net, 
        artifact_path = "pytorch model", 
        signature=signature, 
        input_example=X.numpy(),
        registered_model_name="pytorch_model"
        )

    pytorch_pyfunc = mlflow.pyfunc.load_model(model_uri=model_info.model_uri)

    X_test = torch.randn(20, 10).numpy()
    predictions = pytorch_pyfunc.predict(X_test)
    print(predictions)

    

    eval_data = pd.DataFrame(X.numpy())
    eval_data = pd.DataFrame(y.numpy())
    print(eval_data)
    eval_data.columns = ["label"]
    eval_data["predictions"] = net(X).detach().numpy()
    print(eval_data.shape)
    
    
    results = mlflow.evaluate(
        data=eval_data,
        model_type="regressor",
        targets= "label",
        predictions="predictions",
        evaluators = ["default"]
    )

print(f"metrics:\n{results.metrics}")
print(f"artifacts:\n{results.artifacts}")

Fetch models from model registry

We have two schemas of model uris to retrieve models as follows:

runs scheme

runs:/<run_id>/<model_artifact_path>

models scheme

# Fetch a specific model version
models:/<model_name>/<version>

# Fetch a model version by alias
models:/<model_name>@alias

We can use mlflow.pyfunc.load_model to load any model who has pyfunc flavor from the model registry as follows:

import mlflow.pyfunc
from mlflow import MlflowClient

run_id = "e389609f9f1b44678ea7fea020453f94"
model_artifact_path = "pytorch model"

model = mlflow.pyfunc.load_model(model_uri=f"runs:/{run_id}/{model_artifact_path}")

print(model.metadata)

# OR

model_name = "pytorch_model"
model_version = 1

model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")

print(model.metadata)

# OR

client = MlflowClient()
client.set_registered_model_alias(name = model_name, alias = "staging", version = "1")

model_name = "pytorch_model"
model_alias = "staging"

model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}@{model_alias}")

print(model.metadata)

MLflow projects

An MLflow Project is a format for packaging data science code in a reusable and reproducible way, based primarily on conventions. In addition, the Projects component includes an API and command-line tools for running projects, making it possible to chain together projects into workflows.

Each project is simply a directory of files, or a Git repository, containing your code. Each project can specify several properties:

Name
- A human-readable name for the project.
Entry Points
- Commands that can be run within the project, and information about their parameters.
- You can also call any .py or .sh file in the project as an entry point.
- You can also specify parameters for entry points, including data types and default values.
Environment
- The software environment that should be used to execute project entry points.
- This includes all library dependencies required by the project code.
- Such environments includes Conda environments, Virtualenv environments, System environment (--env_manager=local) and Docker containers.

`MLproject` file

You can get more control over an MLflow Project by adding an MLproject file, which is a text file in YAML syntax, to the project’s root directory. The following is an example of an MLproject file:

# MLproject

name: Predicting customer satisfaction

python_env: python_env.yaml
# or
# conda_env: my_env.yaml
# or
# docker_env:
#    image:  mlflow-docker-example

entry_points:
  main:
    command: "python src/main.py"

# python_env.yaml

# Python version required to run the project.
python: "3.11"

# Dependencies required to build packages. This field is optional.
build_dependencies:
  - pip

# Dependencies required to run the project.
dependencies:
  - mlflow==2.7.3
  - scikit-learn
  - pandas
  - zenml
  - dvc
  - giskard
  - 
  # Add your dependencies here

For the project, you either:

Use the current virtual environment
Here, you just add the option --env-manager=local to every mlflow run command.

Create a new virtual environment via MLflow
You need to install pyenv to let mlflow create a virtul environment as follows:

pip install pyenv --upgrade

After that, add pyenv binaries to PATH env variable as follows:

# Add the line
echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc

# load the file
source ~/.bashrc

# activate the virtual environment again
source .venv/bin/activate

Check that the file ~/.bashrc contains the line:

export PATH="$PATH:$HOME/.pyenv/bin"

When you do not want to run the MLflow project in a virtual environment, rather using the virtual environment of the local repository then you need to add the option --env-manager=local

Specifying Parameters and Entrypoints

Entrypoints
- When specifying an entry point in an MLproject file, the command can be any string in Python format string syntax. MLflow allows specifying a data type and default value for each parameter.

MLflow allows specifying a data type and default value for each parameter. You can specify just the data type by writing:

parameter_name: data_type # Without default 

parameter_name: {type: data_type, default: value}  # Short syntax

parameter_name:     # Long syntax
  type: data_type
  default: value

MLflow supports four parameter types, some of which it treats specially (for example, downloading data to local files). Any undeclared parameters are treated as string. The parameter types are:
- string
  - A text string.
- float
  - A real number. MLflow validates that the parameter is a number.
- path
  - A path on the local file system. MLflow converts any relative path parameters to absolute paths. MLflow also downloads any paths passed as distributed storage URIs (s3://, dbfs://, gs://, etc.) to local files. Use this type for programs that can only read local files.
- uri
  - A URI for data either in a local or distributed storage system. MLflow converts relative paths to absolute paths, as in the path type. Use this type for programs that know how to read from distributed storage (e.g., programs that use Spark).

Running MLflow projects

You can run any MLflow project from a Git URI or from a local directory using the mlflow run command-line tool (Method 1), or the mlflow.projects.run() Python API (Method 2).

Both tools take the following parameters:

Project URI
- A directory on the local file system or a Git repository path, specified as a URI of the form https://<repo> (to use HTTPS) or user@host:path (to use Git over SSH). To run against an MLproject file located in a subdirectory of the project, add a ‘#’ to the end of the URI argument, followed by the relative path from the project’s root directory to the subdirectory containing the desired project.
Project Version
- For Git-based projects, the commit hash or branch name in the Git repository.
Entry Point
- The name of the entry point, which defaults to main.
Parameters
- Key-value parameters.
Environment
- By default, MLflow Projects are run in the environment specified by the project directory or the MLproject file.
- You can override a project’s specified environment and run the project in the current system environment by supplying the --env-manager=local flag, but this can lead to unexpected results if there are dependency mismatches between the project environment and the current system environment.

Examples

Run the project from a local filesystem

using CLI

mlflow run $PROJECTPATH -P alpha=0.5 -e main --env-manager local

using Python API

import mlflow
import os

project_path = os.path.expandvars("$PROJECTPATH")

project_uri = project_path
params = {"alpha": 0.5, "l1_ratio": 0.01}
entry_point = "main"
env_manager = "local"

# Run MLflow project and create a reproducible conda environment
# on a local host
mlflow.projects.run(project_uri,
                    parameters=params,
                    entry_point=entry_point,
                    env_manager=env_manager)

Run the project from a Github repository

using CLI

mlflow run git@github.com:mlflow/mlflow-example.git -P alpha=0.5 -e main --env-manager local

using Python API

import mlflow

project_uri = "https://github.com/mlflow/mlflow-example"
params = {"alpha": 0.5, "l1_ratio": 0.01}
entry_point = "main"
env_manager = "local"

# Run MLflow project and create a reproducible conda environment
# on a local host
mlflow.projects.run(project_uri,
                    parameters=params,
                    entry_point=entry_point,
                    env_manager=env_manager)

By default, MLflow uses a new, temporary working directory for Git projects. This means that you should generally pass any file arguments to MLflow project using absolute, not relative, paths. If your project declares its parameters, MLflow automatically makes paths absolute for parameters of type path.

When running an MLflow Project directory or repository that does not contain an MLproject file, MLflow uses the project’s name as the name of the directory, and a Conda environment containing only latest version of Python.

Runtime parameters are passed to the entry point on the command line using --key value syntax.

Run multiple experiments using Hydra

When we use Hydra for configuration management, it will be easy to run multiple experiments with single command line using --multirun option.

python src/main.py --multirun

You can also activate the multirun mode of Hydra in config files rather than using --multirun option as follows:


# Add this to `configs/main.yaml`
hydra:
  mode: MULTIRUN

Hydra will store the config results for multi run experiments under multirun folder.

Joblib launcher plugin [Extra section]

By default, Hydra runs your multi-run jobs locally and serially. You can use the Joblib Launcher plugin which provides a launcher for parallel tasks based on Joblib.Parallel. It can be installed as a separate package as follows:

pip install hydra-joblib-launcher --upgrade

Add the following to your configs/main.yaml file:

# Override the default launcher
defaults:
  - override hydra/launcher: joblib
  

# Set the number of parallel jobs
hydra:
  launcher:
    # override the number of jobs for joblib
    n_jobs: 10 # it is 10 jobs in parallel now

After this change, Hydra will use joblib launcher whenever you use --multirun as follows:

python src/app.py --multirun

Optuna sweeper plugin [Extra section]

An open source hyperparameter optimization framework to automate hyperparameter search. It can be installed as a separate package as follows:

pip install hydra-optuna-sweeper --upgrade

Add the following to your configs/main.yaml file:

# Override the default launcher
defaults:
  - override hydra/sweeper: optuna
  

# Set the number of parallel jobs
hydra:
  sweeper:
    sampler:
      seed: 8888
    direction: maximize # minimize
    study_name: LR_optimization
    n_trials: 20 # number of times to try to optimize the search space
    n_jobs: 1 # parallelism
    params:
      x: range(-5.5, 5.5, step=0.5)
      y: choice(-5 ,0 ,5)

Optuna concepts

Objective function: is the function to be optimized (minimized/maximized).
Trial: A single call of the objective function.
Study: An optimization session, which is a set of trials.
Parameter: A variable whose value is to be optimized, such as x in the above example.

Simple example:

"""
let's optimize a simple quadratic function: (x - 2)^2
"""

import optuna


# Objective function
def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2


study = optuna.create_study(study_name = "simple optimization example", direction="maximize")
study.optimize(objective, n_trials=100)


best_params = study.best_params
found_x = best_params["x"]
print("Found x: {}, (x - 2)^2: {}".format(found_x, (found_x - 2) ** 2))


study.best_params

study.best_value

study.best_trial

After this change, Hydra will use optuna sweeper whenever you use --multirun as follows:

python src/app.py --multirun

Note:
The function in src/app.py decorated with @hydra.main() should return a float which we want to minimize/maximize.

After you run the optimization, you might find the optimization_results.yaml file (i.e. best params and best value) under multirun logs folder.

Search space configuration

Hydra provides an override parser that support rich syntax.

Interval override
By default, interval is converted to UniformDistribution. You can use IntUniformDistribution, LogUniformDistribution or IntLogUniformDistribution by casting the interval to int and tagging it with log.

python src/app.py --multirun 'x=int(interval(-5.0, 5.0))' 'y=tag(log, interval(1, 10))'

Range override
range is converted to IntUniformDistribution. If you apply shuffle to range, CategoricalDistribution is used instead. If any of range’s start, stop or step is of type float, it will be converted to DiscreteUniformDistribution.

python src/app.py --multirun 'x=range(-5.0, 5.0)' 'y=shuffle(range(-5, 5))'

Choice override
choice is converted to CategoricalDistribution.

python src/app.py --multirun 'x=choice(-5.0, 0.0, 5.0)' 'y=choice(0, 1, 2, 3, 4, 5)'

ML model optimization demo (MLflow + Hydra)

Here I will show a demo on how we can use such tools in the project.

In the project, you need to use Hydra config files to store/retrieve the hyperparameters and settings of models and data. You also can use joblib and optuna plugins to speed up the hyperparameter optimization but be careful of race conditions on logging on the same run.

Step 1. Setup config files

# The line below is the filename and should be stored in the root directory of the repository.
# MLproject


name: <Project Title> 

# python_env: python_env.yaml

entry_points:

  main:
    command: "python src/main.py" # --multirun"
  
  evaluate:
    command: "python src/evaluate.py"

# configs/main.yaml

defaults:
  - _self_
  - data/sample
  - data_version
  - model/model
  - experiment
  # - override hydra/launcher: joblib # submitit_local #joblib
  # - override hydra/sweeper: optuna
  # - override hydra/sweeper/sampler: grid

hydra:
  mode: MULTIRUN
  # launcher:
  #   n_jobs: -1
  sweeper:
    params:
      +model: "rf, lr"
      # +model: "rf"

# configs/experiment.yaml

experiment_name: "mlops_experiment"
run_name: "multi_run"

test_size: 0.2
random_state: 88
cv_n_jobs: -1

train_data_version: "v4"
test_data_version: "v5"

# configs/data_version.yaml

data_version: v4

# configs/model/model.yaml

defaults:
  - _self_

folds: 3

evaluation_metric: "f1"
cv_evaluation_metric: "mean_test_f1"

pyfunc_predict_fn: "predict_proba"

metrics:
  accuracy: "accuracy"
  f1: "f1"


# hydra:
#   sweeper:
#       sampler:
#         seed: 8888
#       # n_trials: 20 # number of times to try to optimize the search space
#       direction: maximize # minimize

# configs/model/lr.yaml

model_name: logistic_regression
artifact_path: basic_lr

tag_key: "model"
tag_value: "basic LR"

module_name: "sklearn.linear_model"
class_name: "LogisticRegression"

params:
  # penalty: ['l1', 'l2']
  solver: ["saga", "lbfgs", "liblinear"]
  max_iter: [100, 200, 1000]
  random_state: [88] #, 100, 44]
  C: [0.1, 0.5, 0.9]
  # C: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
  # C: np.arange(0.1, 1, 0.1)

# hydra:
#   sweeper:
#     params:
#       +params.penalty: "'l1', 'l2'"
#       +params.solver: "'saga'"
#       +params.max_iter: "100, 200, 1000"
#       +params.random_state: 88
#       +params.C: range(start=0.1, stop=1, step=0.1)

# configs/model/rf.yaml

model_name: random_forest
artifact_path: basic_rf

tag_key: "model"
tag_value: "basic RF"

module_name: "sklearn.ensemble"
class_name: "RandomForestClassifier"

params:
  n_estimators: [100, 200, 500]
  criterion: ['gini', 'entropy', 'log_loss']
  random_state: [88] #, 100, 44]


# hydra:
#   sweeper:
#       params:
#         +params.n_estimators: "100, 200, 500"
#         +params.criterion: "'gini', 'entropy', 'log_loss'"
#         +params.random_state: 88

Step 2. Write code to load the data, train the model, log the artifacts and metadata.

# src/model.py

import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import GridSearchCV
from zenml.client import Client
import pandas as pd
import mlflow
import mlflow.sklearn
import importlib

def load_features(name, version, size = 1):
    client = Client()
    l = client.list_artifact_versions(name = name, tag = version, sort_by="version").items
    l.reverse

    df = l[0].load()
    df = df.sample(frac = size, random_state = 88)

    print("size of df is ", df.shape)
    print("df columns: ", df.columns)

    X = df[df.columns[:-1]]
    y = df.y

    print("shapes of X,y = ", X.shape, y.shape)

    return X, y


def log_metadata(cfg, gs, X_train, y_train, X_test, y_test):

    cv_results = pd.DataFrame(gs.cv_results_).filter(regex=r'std_|mean_|param_').sort_index(axis=1)
    best_metrics_values = [result[1][gs.best_index_] for result in gs.cv_results_.items()]
    best_metrics_keys = [metric for metric in gs.cv_results_]
    best_metrics_dict = {k:v for k,v in zip(best_metrics_keys, best_metrics_values) if 'mean' in k or 'std' in k}

    # print(cv_results, cv_results.columns)

    params = best_metrics_dict

    df_train = pd.concat([X_train, y_train], axis = 1)
    df_test = pd.concat([X_test, y_test], axis = 1)

    experiment_name = cfg.model.model_name + "_" + cfg.experiment_name 

    try:
        # Create a new MLflow Experiment
        experiment_id = mlflow.create_experiment(name=experiment_name)
    except mlflow.exceptions.MlflowException as e:
        experiment_id = mlflow.get_experiment_by_name(name=experiment_name).experiment_id # type: ignore
    
    print("experiment-id : ", experiment_id)

    cv_evaluation_metric = cfg.model.cv_evaluation_metric
    run_name = "_".join([cfg.run_name, cfg.model.model_name, cfg.model.evaluation_metric, str(params[cv_evaluation_metric]).replace(".", "_")]) # type: ignore
    print("run name: ", run_name)

    if (mlflow.active_run()):
        mlflow.end_run()

    # Fake run
    with mlflow.start_run():
        pass

    # Parent run
    with mlflow.start_run(run_name = run_name, experiment_id = experiment_id) as run:

        df_train_dataset = mlflow.data.pandas_dataset.from_pandas(df = df_train, targets = cfg.data.target_cols[0]) # type: ignore
        df_test_dataset = mlflow.data.pandas_dataset.from_pandas(df = df_test, targets = cfg.data.target_cols[0]) # type: ignore
        mlflow.log_input(df_train_dataset, "training")
        mlflow.log_input(df_test_dataset, "testing")

        # Log the hyperparameters
        mlflow.log_params(gs.best_params_)

        # Log the performance metrics
        mlflow.log_metrics(best_metrics_dict)

        # Set a tag that we can use to remind ourselves what this run was for
        mlflow.set_tag(cfg.model.tag_key, cfg.model.tag_value)

        # Infer the model signature
        signature = mlflow.models.infer_signature(X_train, gs.predict(X_train))

        # Log the model
        model_info = mlflow.sklearn.log_model(
            sk_model = gs.best_estimator_,
            artifact_path = cfg.model.artifact_path,
            signature = signature,
            input_example = X_train.iloc[0].to_numpy(),
            registered_model_name = cfg.model.model_name,
            pyfunc_predict_fn = cfg.model.pyfunc_predict_fn
        )

        client = mlflow.client.MlflowClient()
        client.set_model_version_tag(name = cfg.model.model_name, version=model_info.registered_model_version, key="source", value="best_Grid_search_model")


        for index, result in cv_results.iterrows():

            child_run_name = "_".join(['child', run_name, str(index)]) # type: ignore
            with mlflow.start_run(run_name = child_run_name, experiment_id= experiment_id, nested=True): #, tags=best_metrics_dict):
                ps = result.filter(regex='param_').to_dict()
                ms = result.filter(regex='mean_').to_dict()
                stds = result.filter(regex='std_').to_dict()

                # Remove param_ from the beginning of the keys
                ps = {k.replace("param_",""):v for (k,v) in ps.items()}

                mlflow.log_params(ps)
                mlflow.log_metrics(ms)
                mlflow.log_metrics(stds)

                # We will create the estimator at runtime
                module_name = cfg.model.module_name # e.g. "sklearn.linear_model"
                class_name  = cfg.model.class_name # e.g. "LogisticRegression"

                # Load "module.submodule.MyClass"
                class_instance = getattr(importlib.import_module(module_name), class_name)
                
                estimator = class_instance(**ps)
                estimator.fit(X_train, y_train)

                # from sklearn.model_selection import cross_val_score
                # scores = cross_val_score(estimator=estimator, 
                #                          X_train, 
                #                          y_train, 
                #                          cv = cfg.model.folds, 
                #                          n_jobs=cfg.cv_n_jobs,
                #                          scoring=cfg.model.cv_evaluation_metric)
                # cv_evaluation_metric = scores.mean()
                
                signature = mlflow.models.infer_signature(X_train, estimator.predict(X_train))

                model_info = mlflow.sklearn.log_model(
                    sk_model = estimator,
                    artifact_path = cfg.model.artifact_path,
                    signature = signature,
                    input_example = X_train.iloc[0].to_numpy(),
                    registered_model_name = cfg.model.model_name,
                    pyfunc_predict_fn = cfg.model.pyfunc_predict_fn
                )

                model_uri = model_info.model_uri
                loaded_model = mlflow.sklearn.load_model(model_uri=model_uri)

                predictions = loaded_model.predict(X_test) # type: ignore
        
                eval_data = pd.DataFrame(y_test)
                eval_data.columns = ["label"]
                eval_data["predictions"] = predictions

                results = mlflow.evaluate(
                    data=eval_data,
                    model_type="classifier",
                    targets="label",
                    predictions="predictions",
                    evaluators=["default"]
                )

                print(f"metrics:\n{results.metrics}")
            
            # mlflow.end_run()  
    
    # mlflow.end_run()  


def train(X_train, y_train, cfg):

    # Define the model hyperparameters
    params = cfg.model.params

    # Train the model
    module_name = cfg.model.module_name # e.g. "sklearn.linear_model"
    class_name  = cfg.model.class_name # e.g. "LogisticRegression"

    # We will create the estimator at runtime
    import importlib

    # Load "module.submodule.MyClass"
    class_instance = getattr(importlib.import_module(module_name), class_name)

    estimator = class_instance(**params)

    # Grid search with cross validation
    from sklearn.model_selection import StratifiedKFold
    cv = StratifiedKFold(n_splits=cfg.model.folds, random_state=cfg.random_state, shuffle=True)

    param_grid = dict(params)

    scoring = list(cfg.model.metrics.values()) # ['balanced_accuracy', 'f1_weighted', 'precision', 'recall', 'roc_auc']

    evaluation_metric = cfg.model.evaluation_metric

    gs = GridSearchCV(
        estimator = estimator,
        param_grid = param_grid,
        scoring = scoring,
        n_jobs = cfg.cv_n_jobs,
        refit = evaluation_metric,
        cv = cv,
        verbose = 1,
        return_train_score = True
    )

    gs.fit(X_train, y_train)

    return gs


def retrieve_model_with_alias(model_name, model_alias = "champion") -> mlflow.pyfunc.PyFuncModel:

    best_model:mlflow.pyfunc.PyFuncModel = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}@{model_alias}")

    # best_model
    return best_model

def retrieve_model_with_version(model_name, model_version = "v1") -> mlflow.pyfunc.PyFuncModel:

    best_model:mlflow.pyfunc.PyFuncModel = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")

    # best_model
    return best_model

# src/main.py


import hydra
from model import train, load_features, log_metadata
from omegaconf import OmegaConf


def run(args):
    cfg = args

    train_data_version = cfg.train_data_version

    X_train, y_train = load_features(name = "features_target", version=train_data_version)

    test_data_version = cfg.test_data_version

    X_test, y_test = load_features(name = "features_target", version=test_data_version)

    # print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

    gs = train(X_train, y_train, cfg=cfg)

    log_metadata(cfg, gs, X_train, y_train, X_test, y_test)

    
@hydra.main(config_path="../configs", config_name="main", version_base=None) # type: ignore
def main(cfg=None):

    # print(OmegaConf.to_yaml(cfg))

    run(cfg)



if __name__=="__main__":
    main()

Step 3. Run `mlflow run` command.

mlflow run . --env-manager=local

Here I am using the current virtual environment (--env-manager=local) for running the MLflow project and it is enough for now.

Master’s students who will probably work with Pytorch, can use skorch package to wrap deep networks and get benefit from GridSearch functionality in sklearn. Check the links below for more info.

Project tasks

Note: The project tasks are graded, and they form the practice part of the course. We have tasks for repository and as well as for report (for Master’s student).

A. Repository

Here use only one (e.g. first) version of your data samples for training your model.

Create MLproject file with one entry point main to run src/main.py with --multirun option. You should decorate its main function with @hydra.main. We are using multirun to run multiple experiments by overriding parameters at runtime. The overriden parameters will be added to the configs/main.yml file as explained in the demo above. You can also add overriden parameteres in MLproject file when you run the script but I prefer the former approach.
The function main in src/main.py will extract the features from ZenML artifact store, train a model, evaluate it via cross validtion and log the metadata. Keep your code clean and organize your project in different modules such that it is easy to maintain. For instance, you can create a function extract_data for the first task, another function train for the second task, evaluate for the third task, log_metadata for the fourth task. Then you call them inside main function. Try to put the functions which deal with data in src/data.py module and functions which deal with models in src/model.py.
Read the features from the ZenML artifact store. Select the appropriate version of the artifact based on the version of the data sample you are tracking in config files.
Split the data sample into training, validation and test sets. Select a specific split ratio. Use one data sample for training the model (80% of the data sample for training and 20% for validation) and next data sample (50%-100% of the next data sample) for testing/evaluating the model performance. For instance, data sample v1 is used for training and validation, and v2 is used for testing the models.
Create an MLflow experiment, a data source, a dataset and use MLflow to log the important metrics, datasets, models and their metadata. Do not use autologging. It is ok if you do not log the data since we have it in dvc store and ZenML artifact store. Use the default metrics provided in mlflow.evaluate to log the metrics. Log all hyperparameters you optimized.
Build at least two different model architectures. For each model architecture, optimize three of its hyperparameters and create a search space consisting of at least three different values per hyperparameter. So, you need to create at least $3^3=27$ runs (Grid search) with different values for the hyperparams. In total, you should train at least $27\times2=54$ models ( $54$ model versions) with $27$ runs per model. In the MLflow tracking server, you would get $2$ models with $27$ versions per model (a new model version per run).
Notes:
- It is enough to have only one experiment with $54$ runs or you can create two experiments with $27$ runs per experiment. It is good practice to name the experiments and runs with different and meaningful names. For example, you can create a new experiment per model architecture, and the run name includes the value of optimization metric for that run such that acc_0.78322932. Use the same model name for registering all models and let the version increment.
- Use Hydra for sweeping over hypereparameter values. You can use advanced sweepers like Optuna (optional). You can use advanced launchers like Joblib but it is only optional and as long as it does not create issues with multiprocecssing.
- Keep the settings of search space for the hyperparams in Hydra config files.
- When you change source code of experimentation or the settings of search space in config files, you should commit the change to Github since MLflow does not track source code changes (or you should log the source code and config files as artifacts everytime).
- You should push all of these logged artifacts and models to github.
- Use Grid search with cross validation to evaluate the performance of the models with at least k=3 folds.
- Use child runs for logging the metric values per single combniation of the selected hyperparameters (nested=True).
Plot performance charts for each model and log them as images (figures or artifacts) with other metadata and artifacts. Download all of these charts from backend store to the folder results in your repository. You can download artifacts (images, models, …etc) from MLflow store as follows:
```
mlflow.artifacts.download_artifacts(artifact_uri, dst_path)

# OR

mlflow.artifacts.download_artifacts(run_id, artifact_path, dst_path)

# artifact_uri is usually starts with runs://run_id/path/to/artifact
```
[Method reproducibility] Check that the ML model is reproducible. Fix the random state of model weights and generators and ensure that you are not getting different results.
[Only for Master’s students] [Result reproducibility] Check that the ML results are reproducible. Use different random seeds (e.g. 10 different seeds) and calculate the average and variance of the model performance. Add the results to results folder as a text file.
Compare the runs and results on MLflow UI for both model architectures.
In MLflow UI, track the performance of the models, and select the best ones per model architecture which can be promoted to staging. You can select more than one model per architecture. Assign an alias champion for the best model among all of these models and another alias challenger1, challenger2, challenger3, …etc to the other models (The alias cannot be used more than once per registered model). It is enough to have one model as champion and one model as challenger.

Download all of these best models (champions and challengers) per architecture to models folder.
[Only Master’s students] The built model should be a deep neural network using Tensorflow or PyTroch or any deep learning packages. In MLflow, you have example in the documentation for Pytorch framework.
Add another entry point evaluate to MLproject file to evaluate the selected model given the data sample version and model alias. By default, we assume that we evaluate the first sample on the champion model. These arguments can be passed to the function evaluate in src/evaluate.py using config files or directly in MLproject file.

In production, the model that is currently deployed is called the Champion model, whereas the model(s) that are being tested in production is/are called the Challenger models. Here we do not have models in production but we aim to deploy the champion model. For the next iterations of the project, we use shadow testing to test the champion model (already in production) with challenger models (candidates to replace the champion model) on the same input data.

You should use Hydra for configuration management and running multiple experiments.

B. Report [Only for Master’s students]

Complete the following chapters:

Chapter 4: Model engineering
- The choice of modeling techniques depends on the ML and the business objectives, the data and the boundary conditions of the project the ML application is contributing to. The requirements and constraints that have been defined in Chapter 1 are used as inputs to guide the model selection to a subset of appropriate models. The goal of the modeling phase is to craft one or multiple models that satisfy the given constraints and requirements.
- Section 4.1. Literature research on similar problems
  - It is best practice to screen the literature (e.g., publications, patents, internal reports) for a comprehensive overview on similar ML tasks, since ML has become an established tool for a wide number of applications. New models can be based on published insights, and previous results can serve as performance baselines.
  - Tasks:
    - Try to find two or three studies which applied ML in a business problem similar to your business problem.
    - Summarize the results of the selected studies.
    - Explain how this research practice can help you in the next sections.
- Section 4.2. Define quality measures of the model
  - The modeling strategy is required to have multiple objectives in mind. Besides a performance metric, soft measures such as robustness, explainability, scalability, resource demand, and model complexity need to be evaluated. In practical application, explainability or robustness might be valued more than accuracy. In such cases, the measures can be weighted differently depending on the application. The models could be ranked either by summing up the weighted quality measures to a scalar or finding Pareto optimal models in a multi-objective optimization process. Additionally, the model’s fairness or trust might have to be assessed and mitigated.
  - Tasks:
    - Define the quality measures of ML models you use in the project.
    - Check the success criteria of ML you defined in phase 1.
    - Explain the quality measures and why you used these metrics.
- Section 4.3. Model Selection
  - There are plenty of ML models and introductory books on classical methods, and Deep Learning can be used to compare and understand their characteristics. The model selection depends on the data and has to be tailored to the problem. There is no such model that performs the best on all problem classes (No Free Lunch Theorem for ML). It is best practice to start with models of lower capacity, which can serve as baseline, and gradually increase the capacity. Validating each step assures its benefit and avoid unnecessary complexity of the model.
  - Tasks:
    - Describe the models you selected for building ML system.
    - Define the input and output dimensions of the models (model signature).
- Section 4.4. Incorporate domain knowledge
  - In practice, a specialized model for a specific task performs better than a general model for all possible tasks. However, adapting the model to a specific problem involves the risk of incorporating false assumptions and could reduce the solution space to a non-optimal subset. Therefore, it is best practice to validate the incorporated domain knowledge in isolation against a baseline. Adding domain knowledge should always increase the quality of the model, otherwise, it should be removed to avoid false bias.
  - Tasks:
    - Assure that the selection of quality metrics and models is relevant to the business problem.
    - Include domain knowledge in this selection and summarize your explanations.
- Section 4.5. Model training
  - The trained model depends on the learning problem, and as such, they are tightly coupled. The learning problem contains an objective, optimizer, regularization, and cross-validation. The objective of the learning problem depends on the application. Different applications value different aspects and need to be tweaked in alignment with the business success criteria. The objective is a proxy to evaluate the performance of the model. The optimizer defines the learning strategy and how to adapt the parameters of the model to improve the objective. Regularization, which can be incorporated in the objective, optimizer, and in the model itself, is needed to reduce the risk of overfitting and can help to find unique solutions. Cross-validation is performed for feature selection, to optimize the hyperparameters of the model and to test its generalization property to unseen data. Crossvalidation is based on a splitting of historical data in training, validation, and testing data, where the latter is used as a proxy for the target environment.
  - Tasks:
    - Specify the test datasets you created. Explain your train test split strategy.
    - Add the modeling results including the comparison of the experiments and runs.
- Section 4.6 Assure reproducibility
  - A major principle of scientific methods and the characteristics of robust ML applications is reproducibility. However, ML models are difficult to reproduce due to the mostly non-convex and stochastic training procedures and randomized data splits. It has been proposed to distinguish reproducibility on two different levels. First, one has to assure that the method itself is reproducible and secondly its results.
  - Tasks:
    - Method reproducibility: This task aims at reproducing the model from an extensive description or sharing of the used algorithm, data set, hyper-parameters, and run-time environment (e.g., software versions, hardware, and random seeds). The algorithm should be described in detail, i.e., with (pseudo) code and on the meta-level including the assumptions.
    - Result reproducibility: It is best practice to validate the mean performance and assess the variance of the model on different random seeds. Reporting only the top performance of the model is common but dubious practice. Large performance variances indicate the sensitivity of the algorithm and question the robustness of the model.
    - Experimental Documentation: Keeping track of the changed model’s performance and its causes by precedent model modifications allows model comprehension by addressing which modifications were beneficial and improve the overall model quality. The documentation should contain the listed properties in the method reproducibility task. Tool-based approaches on version control and meta-data handling while experimenting on ML models and hyper-parameters exist.